Linguistic annotation in / for corpus linguistics

نویسنده

Andrea L. Berez

چکیده

This article surveys linguistic annotation in corpora and corpus linguistics. We first define the concept of 'corpus' as a radial category and then, in Section 2, discuss a variety of kinds of information for which corpora are annotated and that are exploited in contemporary corpus linguistics. Section 3 then exemplifies many current formats of annotation with an eye to highlighting both the diversity of formats currently available and the emergence of XML annotation as, for now, the most widespread form of annotation. Section 4 summarizes and concludes with desiderata for future developments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards an integrated representation of multiple layers of linguistic annotation in multilingual corpora

There has been an increasing interest in recent years in the enrichment of natural language corpora in terms of annotation with explicit linguistic information. This interest manifests itself most prominently in two areas of linguistics: corpus linguistics and computational linguistics. For corpus linguistics, the long standing practice has been to work on raw, i.e., unannotated text. While raw...

متن کامل

Linguistic Annotation: from Links to Cross-Layer Lexicons

Lexicons have always been part of linguistics, the more in the era of computational linguistics. Complex, deep linguistic annotation has emerged as an important research phenomenon relatively recently. Even though various annotation schemes ([10], [13], [15], [16], [17]) have been developed containing some sort of explicit or implicit reference to a “lexicon”, none has presented a coherent and ...

متن کامل

Detecting Annotation Errors in Spoken Language Corpora

Consistency of corpus annotation is an essential property for the many uses of annotated corpora in computational and theoretical linguistics. While some research addresses the detection of inconsistencies in part-of-speech and other positional annotation (van Halteren, 2000; Eskin, 2000; Dickinson and Meurers, 2003a), more recently work has also started to address errors in syntactic and other...

متن کامل

Annotating Discourse Anaphora

In this paper, we present preliminary work on corpus-based anaphora resolution of discourse deixis in German. Our annotation guidelines provide linguistic tests for locating the antecedent, and for determining the semantic types of both the antecedent and the anaphor. The corpus consists of selected speaker turns from the Europarl corpus.

متن کامل

Linguistic Resources and Software for Shallow Processing

This paper presents linguistic resources and software composed by a hand-tagged corpus with 1 million tokens and several shallow processing annotation tools.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Linguistic annotation in / for corpus linguistics

نویسنده

چکیده

منابع مشابه

Towards an integrated representation of multiple layers of linguistic annotation in multilingual corpora

Linguistic Annotation: from Links to Cross-Layer Lexicons

Detecting Annotation Errors in Spoken Language Corpora

Annotating Discourse Anaphora

Linguistic Resources and Software for Shallow Processing

عنوان ژورنال:

اشتراک گذاری